Search CORE

30 research outputs found

Learning Geometric Concepts with Nasty Noise

Author: Daniely A.
Diakonikolas I.
High Robust Estimators
Learning
Publication venue
Publication date: 05/07/2017
Field of study

We study the efficient learnability of geometric concept classes - specifically, low-degree polynomial threshold functions (PTFs) and intersections of halfspaces - when a fraction of the data is adversarially corrupted. We give the first polynomial-time PAC learning algorithms for these concept classes with dimension-independent error guarantees in the presence of nasty noise under the Gaussian distribution. In the nasty noise model, an omniscient adversary can arbitrarily corrupt a small fraction of both the unlabeled data points and their labels. This model generalizes well-studied noise models, including the malicious noise model and the agnostic (adversarial label noise) model. Prior to our work, the only concept class for which efficient malicious learning algorithms were known was the class of origin-centered halfspaces. Specifically, our robust learning algorithm for low-degree PTFs succeeds under a number of tame distributions -- including the Gaussian distribution and, more generally, any log-concave distribution with (approximately) known low-degree moments. For LTFs under the Gaussian distribution, we give a polynomial-time algorithm that achieves error

O(\epsilon)

, where

\epsilon

is the noise rate. At the core of our PAC learning results is an efficient algorithm to approximate the low-degree Chow-parameters of any bounded function in the presence of nasty noise. To achieve this, we employ an iterative spectral method for outlier detection and removal, inspired by recent work in robust unsupervised learning. Our aforementioned algorithm succeeds for a range of distributions satisfying mild concentration bounds and moment assumptions. The correctness of our robust learning algorithm for intersections of halfspaces makes essential use of a novel robust inverse independence lemma that may be of broader interest

arXiv.org e-Print Archive

Crossref

Optimal Testing of Discrete Distributions with High Probability

Author: Diakonikolas I.
Gouleakis T.
Kane D.
Peebles J.
Price E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

MPG.PuRe

Optimal Testing of Discrete Distributions with High Probability

Author: Diakonikolas I.
Gouleakis T.
Kane D.
Peebles J.
Price E.
Publication venue
Publication date: 01/01/2020
Field of study

We study the problem of testing discrete distributions with a focus on the high probability regime. Specifically, given samples from one or more discrete distributions, a property

\mathcal{P}

, and parameters

0< \epsilon, \delta <1

, we want to distinguish {\em with probability at least

1-\delta

} whether these distributions satisfy

\mathcal{P}

or are

\epsilon

-far from

\mathcal{P}

in total variation distance. Most prior work in distribution testing studied the constant confidence case (corresponding to

\delta = \Omega(1)

), and provided sample-optimal testers for a range of properties. While one can always boost the confidence probability of any such tester by black-box amplification, this generic boosting method typically leads to sub-optimal sample bounds. Here we study the following broad question: For a given property

\mathcal{P}

, can we {\em characterize} the sample complexity of testing

\mathcal{P}

as a function of all relevant problem parameters, including the error probability

\delta

? Prior to this work, uniformity testing was the only statistical task whose sample complexity had been characterized in this setting. As our main results, we provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors, as a function of all relevant parameters. We also show matching information-theoretic lower bounds on the sample complexity of these problems. Our techniques naturally extend to give optimal testers for related problems. To illustrate the generality of our methods, we give optimal algorithms for testing collections of distributions and testing closeness with unequal sized samples

MPG.PuRe

On the Complexity of Optimal Lottery Pricing and Randomized Mechanisms

Author: Chen Xi
Diakonikolas I.
Orfanou A.
Paparas D.
Sun Xiaorui
Yannakakis M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2015
Field of study

Edinburgh Research Explorer

The Fourier Transform of Poisson Multinomial Distributions and its Algorithmic Applications

Author: Barbour A.D.
Blonski M.
Daskalakis C.
Diakonikolas I.
Poisson S.D.
Roos B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

(n, k)

-Poisson Multinomial Distribution (PMD) is a random variable of the form

X = \sum_{i=1}^n X_i

, where the

X_i

's are independent random vectors supported on the set of standard basis vectors in

\mathbb{R}^k.

In this paper, we obtain a refined structural understanding of PMDs by analyzing their Fourier transform. As our core structural result, we prove that the Fourier transform of PMDs is {\em approximately sparse}, i.e., roughly speaking, its

L_1

-norm is small outside a small set. By building on this result, we obtain the following applications: {\bf Learning Theory.} We design the first computationally efficient learning algorithm for PMDs with respect to the total variation distance. Our algorithm learns an arbitrary

(n, k)

-PMD within variation distance

\epsilon

using a near-optimal sample size of

\widetilde{O}_k(1/\epsilon^2),

and runs in time

\widetilde{O}_k(1/\epsilon^2) \cdot \log n.

Previously, no algorithm with a

\mathrm{poly}(1/\epsilon)

runtime was known, even for

k=3.

{\bf Game Theory.} We give the first efficient polynomial-time approximation scheme (EPTAS) for computing Nash equilibria in anonymous games. For normalized anonymous games with

n

players and

k

strategies, our algorithm computes a well-supported

\epsilon

-Nash equilibrium in time

n^{O(k^3)} \cdot (k/\epsilon)^{O(k^3\log(k/\epsilon)/\log\log(k/\epsilon))^{k-1}}.

The best previous algorithm for this problem had running time

n^{(f(k)/\epsilon)^k},

where

f(k) = \Omega(k^{k^2})

, for any

k>2.

{\bf Statistics.} We prove a multivariate central limit theorem (CLT) that relates an arbitrary PMD to a discretized multivariate Gaussian with the same mean and covariance, in total variation distance. Our new CLT strengthens the CLT of Valiant and Valiant by completely removing the dependence on

n

in the error bound.Comment: 68 pages, full version of STOC 2016 pape

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Efficiency-Revenue Trade-Offs in Auctions

Author: C. Daskalakis
C.H. Papadimitriou
F. Grandoni
H. Galperin
I. Diakonikolas
J. Bulow
R.B. Myerson
S. Vassilvitskii
Z. Neeman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

How Good is the Chord Algorithm?

Author: Constantinos Daskalakis
Diakonikolas I.
Ilias Diakonikolas
Mihalis Yannakakis
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date
Field of study

Crossref

Distribution-Independent {PAC} Learning of Halfspaces with {Massart} Noise

Author: Diakonikolas I.
Gouleakis T.
Tzamos C.
Publication venue
Publication date: 01/01/2019
Field of study

MPG.PuRe